{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--BOOK_INFORMATION-->\n",
    "<a href=\"https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv\" target=\"_blank\"><img align=\"left\" src=\"data/cover.jpg\" style=\"width: 76px; height: 100px; background: white; padding: 1px; border: 1px solid black; margin-right:10px;\"></a>\n",
    "*This notebook contains an excerpt from the book [Machine Learning for OpenCV](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv) by Michael Beyeler.\n",
    "The code is released under the [MIT license](https://opensource.org/licenses/MIT),\n",
    "and is available on [GitHub](https://github.com/mbeyeler/opencv-machine-learning).*\n",
    "\n",
    "*Note that this excerpt contains only the raw code - the book is rich with additional explanations and illustrations.\n",
    "If you find this content useful, please consider supporting the work by\n",
    "[buying the book](https://www.packtpub.com/big-data-and-business-intelligence/machine-learning-opencv)!*"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Implementing a Multi-Layer Perceptron (MLP) in OpenCV](09.02-Implementing-a-Multi-Layer-Perceptron-in-OpenCV.ipynb) | [Contents](../README.md) | [Training an MLP in OpenCV to Classify Handwritten Digits](09.04-Training-an-MLP-in-OpenCV-to-Classify-Handwritten-Digits.ipynb) >"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "collapsed": true
   },
   "source": [
    "https://github.com/fchollet/keras/blob/master/examples/mnist_cnn.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Getting Acquainted with Deep Learning\n",
    "\n",
    "Back when deep learning didn't have a fancy name yet, it was called artificial neural\n",
    "networks. So you already know a great deal about it! This was a respected field in itself, but\n",
    "after the days of Rosenblatt's perceptron, many researchers and machine learning\n",
    "practitioners slowly began to lose interest in the field since no one had a good solution for\n",
    "training a neural network with multiple layers.\n",
    "\n",
    "With the current popularity of deep learning in both industry and academia, we are\n",
    "fortunate enough to have a whole range of open-source deep learning frameworks at our\n",
    "disposal:\n",
    "- **Google Brain's [TensorFlow](http://www.tensorflow.org)**: This is a machine learning library that describes computations as dataflow graphs. To date, this is one of the most commonly used deep learning libraries. Hence, it is also evolving quickly, so you might have to check back often for software updates. TensorFlow provides a whole range of user interfaces, including Python, C++, and Java interface.\n",
    "- **Microsoft Research's [Cognitive Toolkit (CNTK)](https://www.microsoft.com/en-us/research/product/cognitive-toolkit)**: This is a deep learning framework that describes neural networks as a series of computational steps via a directed graph.\n",
    "- UC Berkeley's [Caffe](http://caffe.berkeleyvision.org): This is a pure deep learning framework written in C++, with an additional Python interface.\n",
    "- **University of Montreal's [Theano](http://deeplearning.net/software/theano)**: This is a numerical computation library compiled to run efficiently on CPU and GPU architectures. Theano is more than a machine learning library; it can express any computation using a specialized computer algebra system. Hence, it is best suited for people who wish to write their machine learning algorithms from scratch.\n",
    "- **[Torch](http://www.torch.ch)**: This is a scientific computing framework based on the Lua programming language. Like Theano, Torch is more than a machine learning library, but it is heavily used for deep learning by companies such as Facebook, IBM, and Yandex.\n",
    "\n",
    "Finally, there is also [Keras](http://keras.io), which we will be using in the following sections. In contrast to\n",
    "the preceding frameworks, Keras understands itself as an interface rather than an end-toend\n",
    "deep learning framework. It allows you to specify deep neural nets using an easy-tounderstand\n",
    "API, which can then be run on backends, such as TensorFlow, CNTK, or\n",
    "Theano."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting acquainted with Keras\n",
    "\n",
    "The core data structure of Keras is a model, which is similar to OpenCV's classifier object,\n",
    "except it focuses on neural networks only. The simplest type of model is the `Sequential`\n",
    "model, which arranges the different layers of the neural net in a linear stack, just like we did\n",
    "for the MLP in OpenCV:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "Using Theano backend.\n"
     ]
    }
   ],
   "source": [
    "from keras.models import Sequential\n",
    "model = Sequential()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Then different layers can be added to the model one by one. In Keras, layers do not just\n",
    "contain neurons, they also perform a function. Some core layer types include the following:\n",
    "\n",
    "- `Dense`: This is a densely connected layer. This is exactly what we used when we designed our MLP: a layer of neurons that is connected to every neuron in the previous layer.\n",
    "- `Activation`: This applies an activation function to an output. Keras provides a whole range of activation functions, including OpenCV's identify function (`linear`), the hyperbolic tangent (`tanh`), a sigmoidal squashing function (`sigmoid`), a softmax function (`softmax`), and many more.\n",
    "- `Reshape`: This reshapes an output to a certain shape.\n",
    "\n",
    "There are other layers that calculate arithmetic or geometric operations on their inputs:\n",
    "- **Convolutional layers**: These layers allow you to specify a kernel with which the input layer is convolved. This allows you to perform operations such as a Sobel filter or apply a Gaussian kernel in 1D, 2D, or even 3D.\n",
    "- **Pooling layers**: These layers perform a max pooling operation on their input, where the output neuron's activity is given by the maximally active input neuron.\n",
    "\n",
    "Some other layers that are popular in deep learning are as follows:\n",
    "- `Dropout`: This layer randomly sets a fraction of input units to zero at each update. This is a way to inject noise into the training process, making it more robust.\n",
    "- `Embedding`: This layer encodes categorical data, similar to some functions from scikit-learn's `preprocessing` module.\n",
    "- `GaussianNoise`: This layer applies additive zero-centered Gaussian noise. This is another way of injecting noise into the training process, making it more robust."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "A perceptron similar to the preceding one could thus be implemented using a\n",
    "`Dense` layer that has two inputs and one output. Staying true to our earlier\n",
    "example, we will initialize the weights to zero and use the hyperbolic tangent as\n",
    "an activation function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from keras.layers import Dense\n",
    "model.add(Dense(1, activation='linear', input_dim=2, kernel_initializer='zeros'))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, we want to specify the training method. Keras provides a number of optimizers,\n",
    "including the following:\n",
    "- **stochastic gradient descent** (`'sgd'`): This is what we have discussed before\n",
    "- **root mean square propagation** (`'RMSprop'`): This is a method in which the\n",
    "learning rate is adapted for each of the parameters\n",
    "- **adaptive moment estimation** (`'Adam'`): This is an update to root mean square propagation and many more\n",
    "\n",
    "In addition, Keras also provides a number of different loss functions:\n",
    "- **mean squared error** (`'mean_squared_error'`): This is what was discussed before\n",
    "- **hinge loss** (`'hinge'`): This is a maximum-margin classifier often used with SVM, as discussed in [Chapter 6](06.00-Detecting-Pedestrians-with-Support-Vector-Machines.ipynb), *Detecting Pedestrians with Support Vector Machines*, and many more"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "You can see that there's a whole plethora of parameters to be specified and methods to\n",
    "choose from. To stay true to our aforementioned perceptron implementation, we will\n",
    "choose stochastic gradient descent as an optimizer, the mean squared error as a cost\n",
    "function, and accuracy as a scoring function:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "model.compile(optimizer='sgd',\n",
    "              loss='mean_squared_error',\n",
    "              metrics=['accuracy'])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In order to compare the performance of the Keras implementation to our home-brewed\n",
    "version, we will apply the classifier to the same dataset:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "from sklearn.datasets.samples_generator import make_blobs\n",
    "X, y = make_blobs(n_samples=100, centers=2, cluster_std=2.2, random_state=42)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Finally, a Keras model is fit to the data with a very familiar syntax. Here, we can also choose\n",
    "how many iterations to train for (`epochs`), how many samples to present before we\n",
    "calculate the error gradient (`batch_size`), whether to shuffle the dataset (`shuffle`), and\n",
    "whether to output progress updates (`verbose`):"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "<keras.callbacks.History at 0x7f00c46ad4a8>"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.fit(X, y, epochs=400, batch_size=100, shuffle=False, verbose=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After the training completes, we can evaluate the classifier as follows:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "\r",
      " 32/100 [========>.....................] - ETA: 0s"
     ]
    },
    {
     "data": {
      "text/plain": [
       "[0.040941802412271501, 1.0]"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "model.evaluate(X, y)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here, the first reported value is the mean squared error, whereas the second value denotes\n",
    "accuracy. This means that the final mean squared error was 0.04, and we had 100%\n",
    "accuracy. Way better than our own implementation!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "import numpy as np\n",
    "np.random.seed(1337)  # for reproducibility"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "With these tools in hand, we are now ready to approach a real-world dataset!"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<!--NAVIGATION-->\n",
    "< [Implementing a Multi-Layer Perceptron (MLP) in OpenCV](09.02-Implementing-a-Multi-Layer-Perceptron-in-OpenCV.ipynb) | [Contents](../README.md) | [Training an MLP in OpenCV to Classify Handwritten Digits](09.04-Training-an-MLP-in-OpenCV-to-Classify-Handwritten-Digits.ipynb) >"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.5.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}
